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Abstract 

When collections of functional data are too large to be exhaustively observed, sur- 
vey sampling techniques provide an effective way to estimate global quantities such as 
the population mean function. Assuming functional data are collected from a finite 
population according to a probabilistic sampling scheme, with the measurements being 
discrete in time and noisy, we propose to first smooth the sampled trajectories with 
local polynomials and then estimate the mean function with a Horvitz-Thompson es- 
timator. Under mild conditions on the population size, observation times, regularity 
of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central 
bimit Theorem in the space of continuous functions. We also establish the uniform 
consistency of a covariance function estimator and apply the former results to build 
confidence bands for the mean function. The bands attain nominal coverage and are 
obtained through Gaussian process simulations conditional on the estimated covariance 
function. To select the bandwidth, we propose a cross-validation method that accounts 
for the sampling weights. A simulation study assesses the performance of our approach 
and highlights the influence of the sampling scheme and bandwidth choice. 

Keywords : CLT, functional data, local polynomial smoothing, maximal inequalities, space 
of continuous functions, suprema of Gaussian processes, survey sampling, weighted cross- 
validation. 



1 Introduction 



The recent development of automated sensors has given access to very large collections of 
signals sampled at fine time scales. However, exhaustive transmission, storage, and analysis 
of such massive functional data may incur very large investments. In this context, when 
the goal is to assess a global indicator like the mean temporal signal, survey sampling tech- 
niques are appealing solutions as they offer a good trade-off between statistical accuracy 
and global cost of the analysis. In particular they are competitive with signal compression 
techniques (Chiky and Hebrail, 2008). The previous facts provide some explanation why, 
although survey sampling and functional data analysis have been long-established statisti- 
cal fields, motivation for studying them jointly only recently emerged in the literature. In 
this regard Cardot et al. (2010a) examine the theoretical properties of functional principal 
components analysis (FPCA) in the survey sampling framework. Cardot et al. (2010b) 
harness FPCA for model-assisted estimation by relating the unobserved principal compo- 
nent scores to available auxiliary information. Focusing on sampling schemes, Cardot and 
Josserand (2011) estimate the mean electricity consumption curve in a population of about 
19,000 customers whose electricity meters were read every 30 minutes during one week. 
Assuming exact measurements, they first perform a linear interpolation of the discretized 
signals and then consider a functional version of the Horvitz-Thompson estimator. For a 
fixed sample size, they show that estimation can be greatly improved by utilizing stratified 
sampling over simple random sampling and they extend the Neyman optimal allocation rule 
(see e.g. Sarndal et al. 1992) to the functional setup. Note however that the finite-sample 
and asymptotic properties of their estimator rely heavily on the assumption of error-free 
measurements, which is not always realistic in practice. The first contribution of the present 
work is to generalize the framework of Cardot and Josserand (2011) to noisy functional data. 
Assuming curve data are observed with errors that may be correlated over time, we replace 
the interpolation step in their procedure by a smoothing step based on local polynomials. 
As opposed to interpolation, smoothing can effectively reduce the noise level in the data, 
which improves estimation accuracy. We establish a functional CLT for the mean function 
estimator based on the smoothed data and prove the uniform consistency of a related co- 
variance estimator. These results have important applications to the simultaneous inference 
of the mean function. 

In relation to mean function estimation, a key statistical task is to build confidence 
regions. There exists a vast and still active literature on confidence bands in nonparametric 
regression. See e.g. Sun and Loader (1994), Eubank and Speckman (1993), Claeskens and 
van Keilegom (2003), Krivobokova et al. (2010), and the references therein. When data 
are functional the literature is much less abundant. One possible approach is to obtain 
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confidence balls for the mean function in a L 2 -space. Mas (2007) exploits this idea in a 
goodness-of-fit test based on the functional sample mean and regularized inverse covariance 
operator. Using adaptive projection estimators, Bunea et al. (2011) build conservative 
confidence regions for the mean of a Gaussian process. Another approach consists in deriving 
results in a space C of continuous functions equipped with the supremum norm. This allows 
for the construction of confidence bands that can easily be visualized and interpreted, as 
opposed to L 2 -confidence balls. This approach is adopted for example by Faraway (1997) 
to build bootstrap bands in a varying-coefficients model, by Cuevas et al. (2006) to derive 
bootstrap bands for functional location parameters, by Degras (2009, 2011) to obtain normal 
and bootstrap bands using noisy functional data, and by Cardot and Josserand (2011) in the 
context of a finite population. In the latter work, the strategy was to first establish a CLT in 
the space C and then derive confidence bands based on a simple but rough approximation 
to the supremum of a Gaussian process (Landau and Shepp, 1970). Unfortunately, the 
associated bands depend on the data-generating process only through its variance structure 
and not its correlation structure, which may cause the empirical coverage to differ from the 
nominal level. The second innovation of our paper is to propose confidence bands that are 
easy to implement and attain nominal coverage in the survey sampling/finite population 
setting. To do so we use Gaussian process simulations as in Cuevas et al. (2006) or Degras 
(2011). This procedure can be thought as a parametric bootstrap, where the parameter to 
be estimated, the covariance function, is lying in an infinite dimensional functional space. 
Our contribution is to provide the theoretical underpinning of the construction method, 
thereby guaranteeing that nominal coverage is attained asymptotically. The theory we 
derive involves maximal inequalities, random entropy numbers, and large covariance matrix 
theory. 

Finally, the implementation of the mean function estimator developed in this paper 
requires the selection of a bandwidth in the data smoothing step. Objective, data-driven 
bandwidth selection methods are desirable for this purpose. As explained by Opsomer and 
Miller (2005), bandwidth selection in the survey estimation context poses specific problems 
(in particular, the necessity to take the sampling design into account) that make usual cross- 
validation or mean square error optimization methods inadequate. In view of the model- 
assisted survey estimation of a population total, these authors propose a cross-validation 
method that aims at minimizing the variance of the estimator, the bias component being 
negligible in their setting. In our functional and design-based framework, the bias is however 
no longer negligible. We therefore devise a novel cross-validation criterion based on weighted 
least squares, with weights proportional to the sampling weights. For the particular case of 
simple random sampling without replacement, this criterion reduces to the cross validation 
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technique of Rice and Silverman (1991), whose asymptotic properties has been studied by 
Hart and Wehrly (1993). 

The paper is organized as follows. We fix notations and define our estimators in sec- 
tion 2. In section 3, we introduce our asymptotic framework based on superpopulation 
models (see Isaki and Fuller, 1982), establish a CLT for the mean function estimator in the 
space of continuous functions, and show the uniform consistency of a covariance estimator. 
Based on these results, we propose a simple and effective method for building simultaneous 
confidence bands. In section 4, a weighted cross-validation procedure is proposed for select- 
ing the bandwidth and simulations are performed to compare different sampling schemes 
and bandwidth choices. Our estimation methodology is seen to compare favorably with 
other methods and to achieve nearly optimal performances. The paper ends with a short 
discussion on topics for future research. Proofs are gathered in an Appendix. 

2 Notations and estimators 

Consider a finite population Un = {l,...,iV} of size N and suppose that to each unit 
k G Un corresponds a real function X k on [0,T], with T < oo. We assume that each 
trajectory X k belongs to the space of continuous functions C([0,T]). Our target is the 
mean trajectory /tx/v(i), t G [0,T], defined as follows: 



We consider a random sample s drawn from Un without replacement according to a 
fixed-size sampling design pn(s), where pn(s) is the probability of drawing the sample s. 
The size un of s is nonrandom and we suppose that the first and second order inclusion 
probabilities satisfy 

• TT k := F(k G s) > for all k G Un 

• n kl := ¥{kkl G s) > for all k,leU N 

so that each unit and each pair of units can be drawn with a non null probability from the 
population. Note that for simplicity of notation the subscript N has been omitted. Also, 
by convention, we write 7r k k = for all k G Un- 

Assume that noisy measurements of the sampled curves are available at d = (In fixed 
discretization points = t\ < t2 < ■ ■ ■ < td = T. For all units k G s, we observe 



where the measurement errors €j k are centered random variables that are independent across 
the index k (units) but not necessarily across j (possible temporal dependence). It is also 




(1) 



Yjk = X k (tj) + ejk 



(2) 
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assumed that the random sample s is independent of the noise ejk and the trajectories 
Xk(t),t G [0, T] are deterministic. 

Our goal is to estimate hn as accurately as possible and to build asymptotic confidence 
bands, as in Degras (2011) and Cardot and Josserand (2011). For this, we must have a 
uniformly consistent estimator of its covariance function. 

2.1 Linear smoothers and the Horvitz-Thompson estimator 

For each (potentially observed) unit k G Un, we aim at recovering the curve X k by 
smoothing the corresponding discretized trajectory (life, . . . , l^fe) with a linear smoother 
(e.g. spline, kernel, or local polynomial): 

d 

X k (t) = Y^W ] {t)Y 3k . (3) 

Note that the reconstruction can only be performed for the observed units k G s. 

Here we use local linear smoothers (see e.g. Fan and Gijbels (1997)) because of their 
wide popularity, good statistical properties, and mathematical convenience. The weight 
functions Wj(t) can be expressed as 

i- h {s 2 (t)-(t 3 -t) Sl (t)}K{^) 
Wj(t) = j— — 2777"^ "> i = 1 '--.,d, (4) 

S2{t)So{t) - Sf{t) 

where K is a kernel function, h > is a bandwidth, and 

d 

^H^^-O'^l^), ' = 0,1,2. (5) 



.7=1 V 7 



We suppose that the kernel K is nonnegative, has compact support, satisfies K(0) > and 
\K(s) — K(t)\ < C\s — t\ for some finite constant C and for all s, t G [0, T]. 
The classical Horvitz-Thompson estimator of the mean curve is 



fces 



where / fe is the sample membership indicator {I k = 1 if k G s and I k = otherwise). It 
holds that E(Jfc) = 7Tfc and E(/ fe /;) = km- 
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2.2 Covariance estimation 



The covariance function of /2jy can be written as 



for all s,i G [0, T], where 



7M 



k,ieu 



TT k TV I N ^ — ' TV k 

k&U 



with 



x k (t) =Y, d j =iW j {t)X k {t j ), 
Ut) =E-=iW r i (t)e fci , 
A fc ; = Cov(4, II) = TV k \ - TV k TV h 

A natural estimator of 7jv(s,i) is given by 



(7) 



(8) 



(9) 



(10) 



It is unbiased and its uniform mean square consistency is established in Section 3.2 



3 Asymptotic theory 

We consider the superpopulation framework introduced by Isaki and Fuller (1982) and 
discussed in detail by Fuller (2009). Specifically, we study the behaviour of the estimators 
fl n and 7at as population Un = {!>••• > A} increases to infinity with N. Recall that the 
sample size n, inclusion probabilities n k and Tv k [, and grid size d all depend on N. In what 
follows we use the notations c and C for finite, positive constants whose value may vary 
from place to place. The following assumptions are needed for our asymptotic study. 

Tl 

(Al) (Sampling design) — > c, n k > c, Tv k i > c, and n\Tv k i — TVk^l\ < C for all k, I G Un 
(k / I) and N > 1. 

(A2) (Trajectories) \X k (s) - X k (t)\ < C\s - tf and \X k (0)\ < C for all k G U N , N > 1, 
and s,t G [0,T], where f3 > \ is a finite constant. 

dfloe loe AH 

(A3) (Growth rates) c < d(t j+l - tj) < C for all 1 < j < d, N > 1, and v 6 6 — ^ -> 
as N — > oo. 

(A4) (Measurement errors) The random vectors (e^i, . . . , e k d)' , k G Un, are i.i.d. and follow 
the multivariate normal distribution with mean zero and covariance matrix V^v- The 
largest eigenvalue of the covariance matrix satisfies ||Vjv|| < C for all A > 1. 
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Assumption (Al) deals with the properties of the sampling design. It states that the 
sample size must be at least a positive fraction of the population size, that the one- and 
two- fold inclusion probabilities must be larger than a positive number, and that the two- 
fold inclusion probabilities should not be too far from independence. The latter is fulfilled 
for example for stratified sampling with sampling without replacement within each stratum 
(Robinson and Sarndal (1983)) and is discussed in details in Hajek (1981) for rejective sam- 
pling and other unequal probability sampling designs. Assumption (A2) imposes Holder 
continuity on the trajectories, a mild regularity condition. Assumption (A3) states that 
the design points have a quasi-uniform repartition (this holds in particular for equidis- 
tant designs and designs generated by a regular density function) and that the grid size is 
essentially negligible compared to the population size (for example if djy oc N a for some 
a £ (0, 1)). In fact the results of this paper also hold if d^/N stays bounded away from zero 
and infinity as N — > oo (see Section 5). Finally (A4) imposes joint normality, short range 
temporal dependence, and bounded variance for the measurement errors e^j, 1 < j < d. It 
is trivially satisfied if the e^ ~ N(0, cr|) are independent with variances Var(efcj) < C. It 
is also verified if the ekj arise from a discrete time Gaussian process with short term tem- 
poral correlation such as ARMA or stationary mixing processes. Note that the Gaussian 
assumption is not central to our derivations: it can be weakened and replaced by moment 
conditions on the error distributions at the expense of much more complicated proofs. 

3.1 Limit distribution of the Horvitz-Thompson estimator 

We now derive the asymptotic distribution of our estimator /J at in order to build asymptotic 
confidence bands. Obtaining the asymptotic normality of estimators in survey sampling is 
a technical and difficult issue even for simple quantities such as means or totals of real 
numbers. Although confidence intervals are commonly used in the survey sampling com- 
munity, the Central Limit Theorem (CLT) has only been checked rigorously, as far as we 
know, for a few sampling designs. Erdos and Renyi (1959) and Hajek (1960) proved that 
the Horvitz-Thompson estimator is asymptotically Gaussian for simple random sampling 
without replacement. The CLT for rejective sampling is shown by Hajek (1964) whereas the 
CLT for other proportional to size sampling designs is studied by Berger (1998). Recently 
these results were extended for some particular cases of two-phase sampling designs (Chen 
and Rao, 2007). Let us assume that the Horvitz-Thompson estimator satisfies a CLT for 
real- valued quantities. 



(A5) (Univariate CLT) For any fixed t G [0,T], it holds that 
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as N — > oo, where ~» stands for convergence in distribution. 

We recall here the definition of the weak convergence in C([0,T]) equipped with the 
supremum norm || • (e.g. van der Vaart and Wellner, 2000). A sequence (£jv) of random 
elements of C([0,T]) is said to converge weakly to a limit £ in C([0,T]) if E(0(£jv)) — > 
E(</>(£)) as iV — > oo for all bounded, uniformly continuous functionals <fi on (C([0, T]), || • ||oo)- 

To establish the limit distribution of fl n in C([0,T]), we need to assume the existence 
of a limit covariance function 



In the following theorem we state the asymptotic normality of the estimator /2tv in the 
space C([0,T]) equipped with the sup norm. 



in C([0, T]), where G is a Gaussian process with mean zero and covariance function 7. 



for a variety of statistical tests based on supremum norms (see Degras, 2011). 

Observe that the conditions on the bandwidth h and design size d are not very con- 
straining. Suppose for example that d oc iV and h oc N~ v for some 77, v > 0. Then d and h 
satisfy the conditions of Theorem [l] as soon as (2/3)- 1 < v < n < 1. Thus, for more regular 
trajectories, i.e. larger /3, the bandwidth h can be chosen with more flexibility. 

The proof of Theorem [T] is similar in spirit to that of Theorem 1 in Degras (2011) and 
Proposition 3 in Cardot and Josserand (2010). Essentially, it breaks down into: (i) control- 
ling uniformly on [0, T] the bias of /2at, (ii) establishing the functional asymptotic normality 
of the local linear smoother applied to the sampled curves Xk, and (iii) controlling uni- 
formly on [0,T] (in probability) the local linear smoother applied to the errors e,£. Part 
(i) is easily handled with standard results on approximation properties of local polynomial 
estimators (see e.g. Tsybakov (2009)). Part (ii) mainly consists in proving an asymptotic 
tightness property, which entails the computation of entropy numbers and the use of max- 
imal inequalities (van der Vaart and Wellner, 2000). Part (iii) requires first to show the 
finite-dimensional convergence of the smoothed error process to zero and then to establish 
its tightness with similar arguments as in part (ii). 




k,l&U N 



Theorem 1. Assume (Al)-(A5) and that 
Then 



y/NhP — > and dh/logd — > 00 as N — ^ 00. 




Theorem [T] provides a convenient way to infer the local features of fijy. It is applied in 
Section |3.3| to the construction of simultaneous confidence bands, but it can also be used 



S 



3.2 Uniform consistency of the covariance estimator 

We first note that under (A1)-(A4), by the approximation properties of local linear smoothers, 
7jv converges uniformly to 7 on [0, T] 2 as h — > and N — > 00. Hence the consistency of 7/v 
can be stated with respect to 7 instead of 7 at. In alignment with the related Proposition 2 
in Cardot and Josserand (2011) and Theorem 3 in Breidt and Opsomer (2000), we need to 
make some assumption on the two-fold inclusion probabilities of the sampling design p ^ ■ 

(A6) 

lim max \E{(I kl h 2 - TT klk2 )(I k3 I ki - 7r k:}k4 )}\ = 

where D^n is the set of all quadruples (k±, k2, k$, £4) in Un with distinct elements. 

This assumption is discussed in detail in Breidt and Opsomer (2000) and is fulfilled for 
example for stratified sampling. 

Theorem 2. Assume (A1)-(A4), (A6), and that h — > and dh l+a — > 00 for some a > 
as N —7- 00. Then 

( ,~ , 2 \ 

lim E sup 7tv(s, t) — 7(5, t)\ =0. 

N ->°° \s,te[o,T\ 2 J 
where the expectation is jointly with respect to the design and the multivariate normal model. 

Note the additional condition on the bandwidth h in Theorem 2. If we suppose, as in the 



remark in Section 3.1 that d oc N* 1 and h oc N u for some (2/3) < v < r] < 1, then 



condition dh 1+a — > 00 as — > 00 is fulfilled with e.g. a = 1 — r\j2v. 
3.3 Confidence bands 

In this section we build confidence bands for /iAr of the form 



fi N (t) ± c 



ATl/2 



te[o,T}}, (11) 



where c is a suitable number and ai\i(t) = 77v(t, i) 1 / 2 . More precisely, given a confidence 
level 1 — a G (0, 1), we seek c = c Q that approximately satisfies 

F(\G(t)\ < ca(t), Vt G [0,T]) = 1 - a, (12) 

where G is a Gaussian process with mean zero and covariance function 7, and where o~(t) = 
7(t,t) 1 / 2 . Exact bounds for the supremum of Gaussian processes have been derived for 
only a few particular cases (Adler and Taylor, 2007, Chapter 4). Computing accurate and 
as explicit as possible bounds in a general setting is a difficult issue and would require 
additional strong conditions such as stationarity which have no reason to be fulfilled in our 
setting. 
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In view of Theorems |1||2| and Slutski's Theorem, the bands defined in ( 11 ) with c chosen 

a. The following result provides a 



as in (12) will have approximate coverage level 1 



simulation-based method to compute c. 

Theorem 3. Assume (A1)-(A6) and dh l+a — > oo for some a > as N — )• oo. Let G be 
a Gaussian process with mean zero and covariance function 7. Let (Gn) be a sequence of 
processes such that for each N, conditionally on 77V, Gn is Gaussian with mean zero and 
covariance 77V defined in (10). Then for all c > 0, as N — )• oo ; the following convergence 
holds in probability: 

P (\G N (t)\ < ca N (t), Vt G [0, T] I 7/v) -> P (|G(t)| < ca(f), Vt G [0, T]) . 



Theorem |3j is derived by showing the weak convergence of (Gjv) to G in C([0, T]), which 
stems from Theorem [2] and the Gaussian nature of the processes Gjy. As in the first 
two theorems, maximal inequalities are used to obtain the above weak convergence. The 
practical importance of Theorem [3] is that it allows to estimate the number c in ( 12 ) via 
simulation: (with the previous notations), conditionally on 77V, one can simulate a large 
number of sample paths of the Gaussian process (Gn/&n) and compute their supremum 
norms. One then obtains a precise approximation to the distribution of ||Gjv/?jv[|oo) an d 
it suffices to set c as the quantile of order (1 — a) of this distribution: 



\G N {t)\ <ca N (t), Vt£ [0,T]\j N 



1 — a. 



(13) 



Corollary 1. Assume (Al)-(A6). Under the conditions of Theorems the bands 

defined in (11) with the real c = c(^n) chosen as in (13) have asymptotic coverage level 
1 — a, i.e. 



lim P fj, N (t) <G 

-/V-»oo 



<7jv(i) 

jyi/a 



Vt 6 [0, T) 



1 — a. 



4 A simulation study 

In this section, we evaluate the performances of the mean curve estimator as well as the 
coverage and the width of the confidence bands for different bandwidth selection criteria 
and different levels of noise. The simulations are conducted in the R environment. 



4.1 Simulated data and sampling designs 

We have generated a population of N = 20000 curves discretized at d = 200 and d = 400 
equidistant instants of time in [0, 1]. The curves of the population are generated so that they 
have approximately the same distribution as the electricity consumption curves analyzed 
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Figure 1: Basis functions v\ (solid line), Vi (dashed line) and V3 (dotted line). 

in Cardot and Josserand (2011) and each individual curve Xk, for h £ U, is simulated as 
follows 

3 

X k (t)=fi(t) + Y J Zev e (t), i€[0,l], (14) 
t=\ 

where the mean function \i is drawn in Figure [2] and the random variables are independent 
realizations of a centered Gaussian random variable with variance erf. The three basis 
function v\ , V2 and 1*3 are orthonormal functions which represent the main mode of variation 
of the signals, they are represented in Figure [TJ Thus, the covariance function of the 
population 7(5, t) is simply 

3 

7(s,t) = J>! «/(s)«/(t). (15) 
£=1 

To select the samples, we have considered two probabilistic selection procedures, with 
fixed sample size, n = 1000, 
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Stratum number 


1 


2 


3 


4 


5 


Stratum size 


10000 


4000 


3000 


2000 


1000 


Allocation 


655 


132 


98 


68 


47 



Table 1: Strata sizes and optimal allocations. 



• Simple random sampling without replacement (SRSWOR). 

• Stratified sampling with SRSWOR in all strata. The population U is divided into a 
fixed number of H = 5 strata built by considering the quantiles <?o.5> <Zo.7) 90.85 and go. 95 
of the total consumption f X k {£)dt for all units k G U. For example, the first strata 
contains all the units k such that f Q X k (t)dt < qo. 5, and thus its size is half of the 
population size N. The sample size n g in stratum g is determined by a Neyman-like 
allocation, as suggested in Cardot and Josserand (2011), in order to get a Horvitz- 
Thompson estimator of the mean trajectory whose variance is as small as possible. 
The sizes of the different strata, which are optimal according to this mean variance 
criterion, are reported in Table [TJ 

We suppose we observe, for each unit k in the sample s, the discretized trajectories, at 
d equispaced points, = t\ < . . . < = 1, 

Y jk = Xkitj) + 6e jk . (16) 

The parameter 5 controls the noise level compared to the true signal. We consider two 
different situations for the noise components €j k ■ 

• Heteroscedasticity. The €j k ~ N(0, 7(ij, tj)) are independent random variables whose 
variances are proportional to the population variances at time tj. 

• Temporal dependence. The €j k are stationary AR(3) processes with Gaussian innova- 
tions generated as follows 

€jk = 0.89ej_i >fe + 0.3ej_ 2 ,fc - 0.4ej_ 3ife + r] jk . 

The rjjh ~ N(0,a^) are i.i.d. and is such that E(e^ fc ) = d^ 1 Y^j=i li^j^j)- 

As an illustrative example, a sample of n = 10 noisy discretized curves are plotted in Figure 
[2] with heteroscedastic noise components and in Figure [3] for correlated noise. It should be 
noted that the observed trajectories corrupted by the correlated noise are much smoother 
than the trajectories corrupted by the heteroscedastic noise. The empirical standard devi- 
ation in the population, for these two different type of noise are drawn in Figure |4j 
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0.0 



0.2 



0.4 



0.6 



0.8 



1.0 



Time 



Figure 2: A sample of 10 curves for 5 = 0.05 in the heteroscedastic case. True trajectories 
are plotted with black lines whereas noisy observations are plotted in gray. The mean profile 
is plotted in bold line. 

4.2 Weighted cross-validation for bandwidth selection 

Assuming we can access the exact trajectories X^, k £ s, (which is the case in simulations) 
we consider the oracle-type estimator 



which will be a benchmark in our numerical study. We compare different interpolation and 
smoothing strategies for estimating the Xf., k £ s: 

• Linear interpolation of the Yjk as in Cardot and Josserand (2011). 

• Local linear smoothing of the Yjk with bandwidth h as in (|3|). 




(17) 
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0.0 0.2 0.4 0.6 0.8 1.0 

Time 

Figure 3: A sample of 10 curves for 5 = 0.05 in the autoregressive case. True trajectories are 
plotted with black lines whereas noisy observations are plotted in gray. The mean profile is 
plotted in bold line. 

The crucial parameter here is h. To evaluate the interest of smoothing and the performances 
of data-driven bandwidth selection criteria, we consider an error measure that compares the 
oracle ju s to any estimator Jl based on the noisy data Yjk, k G s, j = 1, . . . d: 

(■T 

L(p)= / (m-MVfdt. (18) 
J o 

Considering the estimator defined in Q, we denote by h orac \ e the bandwidth h that mini- 



mizes (18). The mean estimator built with bandwidth /i racie is called smooth oracle esti- 
mator. 
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Figure 4: Empirical standard deviation of the noise in the population for p = 400 discretiza- 
tion points. Standard deviation for heteroscedastic case is drawn in solid line and dashed 
line for correlated noise. 

When Ylkes n k 1 = as m SRSWOR and stratified sampling, it can be easily checked 
that fi s is the minimum argument of the weighted least squares functional 



with respect to p G L 2 ([0, T]), where the weights are Wk = {NiTk) 1 . Then, a simple and 
natural way to select bandwidth h is to consider the following design-based cross validation 




(19) 




(20) 



where 



Mjv*(*) = 2 W£k ^) 
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with new weights w^. A heuristic justification for this approach is that, given s, we have 
E €jk(Xk(tj) — /U^r fc (ij))|sJ = for j = 1, . . . , d and k G s. Thus, 

d , r 2 1 

E[WCV(h)\s]=J2 w kJ2\ E { X k(h)-^N k (h)) I s +2E[e jk (X k (t J )-jl] v k (t J ))\ S \+E[ 



fees j=i 



,2 

Xk(tj) -fi N {tj) 



+ tr(V 



A' 



fees i=i 

and, up to tr(V"7v) which does not depend on /i, the minimum value of the expected cross 
validation criterion should be attained for estimators which are not too far from fi s . 

This weighted cross validation criterion is simpler than the cross validation criteria based 
on the estimated variance proposed in Opsomer and Miller (2005). Indeed, in our case, the 
bias may be non negligible and focusing only on the variance part of the error leads to too 
large selected values for the bandwidth. Furthermore, Opsomer and Miller (2005) suggested 
to consider weights defined as follows W£k = we/(l — Wk)- For SRSWOR, since Wk = n" 1 one 



has wik = (n — 1) , so that the weighted cross validation criterion defined in (20) is exactly 
the cross validation criterion introduced by Rice and Silverman (1991) in the independent 
case. We denote in the following by h cv the bandwidth value minimizing this criterion. 

For stratified sampling, a better approximation which keeps the design-based properties 
of the estimator can be obtained by taking into account the sampling rates in the 
different strata. Assume the population U is partitioned in strata U v of respective sizes 
N u , v = 1,. . . ,H, and we sample n u observations in each U v by SRSWOR. If k £ U u , we 
have u>k = N u (Nn u )~ 1 . Thus, we take wgk = (N v — l){(iV — l)(n„ — l)} -1 for all the 
units I k in stratum U v and scale the weights for all the units £' of the sample that do 
not belong to stratum g, w^k = N(N — 1)~ 1 W£>. We denote by h wcv the bandwidth value 



minimizing (20). 



4.3 Estimation errors and confidence bands 

We draw 1000 samples in the population of curves and compare the different estimators of 
Section 4.2 with the I? loss criterion 

R{V)= C \m-Kt)) 2 dt (21) 

J 

for different values of 5 and d in ( |16[ ). For comparison, the quadratic approximation error 
for function fi by its average value, JL = T _1 /j,(t)dt, is R{p) = 3100. 

The empirical mean as well as the first, second and third quartiles of the estimation 
error R{jT) are given, when d=200, in Table [2] for the heteroscedastic noise case. Results 
for d = 400 are presented for the heteroscedastic case in Table [3] and in Table [4] for the 
correlated case. 
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SRSWOR 


Stratified sampling 


s 


h 


Mean 


1Q 


Median 


3Q 


Mean 


1Q 


Median 


3Q 


5% 


lin 


17.65 


3.08 


8.73 


23.50 


4.22 


1.44 


2.79 


5.59 




h cv 


17.65 


3.07 


8.71 


23.51 


6.49 


3.61 


5.36 


8.03 




h wcv 


17.65 


3.07 


8.71 


23.51 


4.22 


1.45 


2.78 


5.56 




^oracle 


17.65 


3.07 


8.72 


23.50 


4.22 


1.45 


2.78 


5.57 






17.60 


3.01 


8.70 


23.36 


4.17 


1.38 


2.76 


5.55 


25% 


lin 


17.69 


3.94 


8.99 


21.52 


5.26 


2.63 


4.15 


6.54 




h cv 


17.53 


3.83 


8.76 


21.53 


6.98 


4.29 


5.83 


8.47 




h wcv 


17.53 


3.83 


8.76 


21.53 


5.02 


2.39 


3.89 


6.33 




^oracle 


17.52 


3.81 


8.78 


21.52 


5.01 


2.37 


3.88 


6.27 






16.58 


2.85 


7.87 


20.01 


4.07 


1.46 


2.94 


5.28 



Table 2: (heteroscedastic noise). Estimation errors according to R(fi>) for different noise lev- 
els and bandwidth choices, with d = 200 observation times. Units are selected by SRSWOR 
or stratified sampling. 



We first note that in all simulations, stratified sampling largely improves the estimation 
of the mean curve in comparison to SRSWOR. Also, linear interpolation performs nearly 
as well as the smooth oracle estimator for large samples, especially when the noise level 
is low (5 = 5%). As far as bandwidth selection is concerned, the usual cross validation 
criterion h cv is not adapted to unequal probability sampling and tends to select too large 
bandwidth values. In particular it does not perform as well as linear interpolation for 
stratified sampling. On the other hand, our weighted cross-validation method seems effective 
for selecting the bandwidth. It produces estimators that are very close to the oracle and 
that dominate the other estimators when the noise level is moderate or high (5 = 25%). 



This is clearer when we look at criterion L(jT), defined in (18), which only focuses on 
the part of the estimation error which is due to the noise. Results are presented in Table [5] 
for d = 200 in the heteroscedastic case. For d = 400, errors are given in Table [6] in the 
heteroscedastic case and in Table [7] for correlated noise. When the noise level is high, we 
observe a significant impact of the number of discretization points on the accuracy of the 
smoothed estimators. Our individual trajectories, which have roughly the same shape as 
load curves, are actually not very smooth so that smoothing approaches are only really 
interesting, compared to linear interpolation, when the number of discretization points d is 
large enough. Finally, it also becomes clearer that a key parameter is the bandwidth value 
which has to be chosen with appropriate criteria that must take the sampling weights into 
account. When the noise level is low (5 = 5%), the error according to criterion L(ji) is 
multiplied by at least 15 in stratified sampling. 
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SRSWOR 


Stratified sampling 


s 


h 


Mean 


1Q 


Median 


3Q 


Mean 


1Q 


Median 


3Q 


5% 


lin 


18.03 


3.39 


9.24 


23.27 


4.05 


1.45 


2.86 


5.35 




h cv 


18.02 


3.38 


9.26 


23.34 


6.09 


3.24 


4.87 


7.56 




h wcv 


18.02 


3.38 


9.26 


23.34 


4.05 


1.45 


2.82 


5.40 




^oracle 


18.02 


3.38 


9.27 


23.32 


4.04 


1.43 


2.83 


5.39 






17.98 


3.35 


9.20 


23.17 


4.00 


1.39 


2.81 


5.29 


25% 


lin 


18.16 


3.89 


9.43 


22.86 


5.25 


2.85 


4.24 


6.57 




h cv 


17.55 


3.30 


8.89 


22.09 


6.45 


3.77 


5.37 


8.11 




h wcv 


17.55 


3.30 


8.89 


22.09 


4.57 


2.12 


3.49 


5.81 




^oracle 


17.55 


3.28 


8.89 


22.09 


4.56 


2.11 


3.48 


5.81 






17.04 


2.75 


8.38 


21.87 


4.04 


1.60 


3.02 


5.31 



Table 3: (heteroscedastic noise). Estimation errors according to R(jl) for different noise 
levels and bandwidth choices, with d = 400 observation times. Units are selected by SR- 
SWOR or stratified sampling. 



We now examine in Table [8j Table [9] and Table 10 the empirical coverage and the width 
of the confidence bands, which are built as described in Section 3.3 For each sample, we 
estimate the covariance function 77V and draw 10000 realizations of a centered Gaussian 
process with variance function 7/v in order to obtain a suitable coefficient c with a confi- 



dence level of 1 — a = 0.95 as explained in equation (13). The area of the confidence band 
is then 2cy/^y(t, t) dt. The results highlight now the interest of considering smoothing 
strategies combined with the weighted cross validation bandwidth selection criterion (20). 
For stratified sampling, the use of the unweighted cross validation criterion leads to empir- 
ical coverage levels that are significantly below the nominal one. It also appears that linear 
interpolation, which does not intend to get rid of the noise, always gives larger confidence 
bands than the smoothed estimators based on h wcv . As before, smoothing approaches be- 
come more interesting as the number of discretization points and the noise level increase. 
The empirical coverage of the smoothed estimator is lower than the linear interpolation 
estimator but remains slightly higher than the nominal one. 

As a conclusion of this simulation study, it appears that smoothing is not a crucial aspect 
when the only target is the estimation of the mean, and that bandwidth values should be 
chosen by a cross validation criterion that takes the sampling weights into account. When 
the goal is also to build confidence bands, smoothing with weighted cross validation criteria 
lead to narrower bands compared to interpolation techniques, without deteriorating the 
empirical coverage. Smoothing strategies which do not take account of unequal probability 
sampling weights lead to empirical coverage levels that can be far below the expected ones. 
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SRSWOR 


Stratified sampling 


5 


h 


Mean 


1Q 


Median 


3Q 


Mean 


1Q 


Median 


3Q 


5% 


lin 


16.23 


3.05 


8.67 


20.86 


4.08 


1.40 


2.88 


5.44 




h cv 


16.24 


3.07 


8.66 


20.88 


5.90 


2.99 


4.70 


7.33 




^wcv 


16.24 


3.07 


8.66 


20.88 


4.10 


1.38 


2.90 


5.47 




^oracle 


16.24 


3.06 


8.65 


20.88 


4.10 


1.38 


2.90 


5.46 




V-s 


16.19 


3.01 


8.69 


20.86 


4.04 


1.34 


2.82 


5.36 


25% 


lin 


17.18 


3.88 


9.38 


22.04 


5.22 


2.65 


4.07 


6.47 




h cv 


17.13 


3.84 


9.28 


22.02 


6.76 


3.98 


5.76 


8.32 




^wcv 


17.13 


3.84 


9.28 


22.02 


5.16 


2.59 


4.02 


6.37 




^oracle 


17.12 


3.81 


9.25 


22.02 


5.15 


2.59 


4.01 


6.37 




V-s 


16.12 


2.87 


8.17 


21.00 


4.04 


1.49 


2.94 


5.27 



Table 4: (correlated noise). Estimation errors according to R{fl) for different noise levels 
and bandwidth choices, with d = 400 observation times. Units are selected by SRSWOR or 
stratified sampling. 

5 Concluding remarks 

In this paper we have used survey sampling methods to estimate a population mean temporal 
signal. This type of approach is extremely effective when data transmission or storage costs 
are important, in particular for large networks of distributed sensors. Considering noisy 
functional data, we have built the Horvitz-Thompson estimator of the population mean 
function based on a smooth version of the sampled curves. It has been shown that this 
estimator satisfies a CLT in the space of continuous functions and that its covariance can be 
estimated uniformly and consistently. Although our theoretical results were presented in this 
paper with a Horvitz-Thompson covariance estimator, they are very likely to hold for other 
popular estimators such as the Sen- Yates-Grundy estimator. We have applied our results to 
the construction of confidence bands with asymptotically correct coverage. The bands are 
simply obtained by simulating Gaussian processes conditional on the estimated covariance. 
The problem of bandwidth selection, which is particularly difficult in the survey sampling 
context, has been addressed. We have devised a weighted cross-validation method that aims 
at mimicking an oracle estimator. This method has displayed very good performances in our 
numerical study; however, a rigorous study of its theoretical properties remains to be done. 
Our numerical study has also revealed that in comparison to SRSWOR, unequal probability 
sampling (e.g. stratified sampling) yields far superior performances and that when the noise 
level in the data is moderate to high, incorporating a smoothing step in the estimation 
procedure enhances the accuracy in comparison to linear interpolation. Furthermore, we 
have seen that even when the noise level is low, smoothing can be beneficial for building 
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SRSWOR 


Stratified sampling 


8 


h 


Mean 


1Q 


Median 


3Q 


Mean 


1Q 


Median 


3Q 


5% 


lin 


0.044 


0.041 


0.044 


0.047 


0.049 


0.046 


0.049 


0.053 




h cv 


0.044 


0.041 


0.044 


0.048 


2.520 


2.083 


2.852 


3.032 




h 


a A A A 

0.044 


A A A 1 
0.041 


A A A A 
0.044 


A A A O 

0.048 


A AC O 
0.058 


A A C A 

0.054 


A A C O 

0.058 


A A£JO 
0.062 




^oracle 


0.044 


0.041 


0.044 


0.047 


0.049 


0.045 


0.049 


0.052 


25% 


lin 


1.087 


1.011 


1.080 


1.156 


1.214 


1.134 


1.210 


1.287 




^bev 


0.905 


0.837 


0.901 


0.970 


3.155 


2.638 


3.260 


3.602 




h wcv 


0.905 


0.837 


0.901 


0.970 


1.009 


0.936 


1.004 


1.076 




^oracle 


0.898 


0.830 


0.894 


0.962 


0.990 


0.919 


0.988 


1.055 



Table 5: (heteroscedastic noise). Estimation errors according to L(jl) for different noise lev- 
els and bandwidth choices, with d = 200 observation times. Units are selected by SRSWOR 
or stratified sampling. 





SRSWOR 


Stratified sampling 


8 


h 


Mean 


1Q 


Median 


3Q 


Mean 


1Q 


Median 


3Q 


5% 


lin 


0.044 


0.042 


0.044 


0.047 


0.049 


0.047 


0.049 


0.051 






0.040 


0.038 


0.040 


0.042 


2.231 


1.612 


1.917 


2.806 




^wcv 


0.040 


0.038 


0.040 


0.042 


0.052 


0.049 


0.052 


0.055 




^oracle 


0.040 


0.038 


0.040 


0.042 


0.044 


0.041 


0.044 


0.046 


25% 


lin 


1.089 


1.030 


1.087 


1.142 


1.219 


1.155 


1.212 


1.280 






0.498 


0.462 


0.495 


0.535 


2.591 


1.932 


2.344 


3.254 




^wcv 


0.498 


0.462 


0.495 


0.535 


0.552 


0.509 


0.549 


0.594 




^oracle 


0.497 


0.460 


0.494 


0.533 


0.547 


0.505 


0.545 


0.586 



Table 6: (heteroscedastic noise). Estimation errors according to L(J2) for different noise lev- 
els and bandwidth choices, with d = 400 observation times. Units are selected by SRSWOR 
or stratified sampling. 

confidence bands. Indeed, smoothing the data leads to estimators that have higher temporal 
correlation, which in turn makes the confidence bands narrower and more stable. Our 
method for confidence bands is simple and quick to implement. It gives satisfactory coverage 
(a little conservative) when the bandwidth is chosen correctly, e.g. with our weighted cross- 
validation method. Such confidence bands can find a variety of applications in statistical 
testing. They can be used to compare mean functions in different sub-populations, or to 
test for a parametric shape or for periodicity, among others. Examples of applications can 
be found in Degras (2011). 

This work also raises some questions which deserve further investigation. A straight- 
forward extension could be to relax the normality assumption made on the measurement 
errors. It is possible to consider more general error distributions under additional assump- 
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SRSWOR 


Stratified sampling 


s 


h 


Mean 


1Q 


Median 


3Q 


Mean 


1Q 


Median 


3Q 


5% 


lin 


0.17 


0.15 


0.16 


0.18 


0.05 


0.04 


0.04 


0.05 




h cv 


0.17 


0.15 


0.16 


0.18 


1.94 


1.53 


1.59 


2.90 




u 


a 1 h 
0.17 


a 1 c 
0.15 


a 1 a 
0.16 


A 1 o 

0.18 


A AT 


A AT 
0.U7 


A AT 

0.U7 


A AO 
0.08 




^oracle 


0.17 


0.15 


0.16 


0.18 


0.07 


0.07 


0.07 


0.08 


25% 


lin 


1.09 


1.03 


1.09 


1.14 


1.20 


1.08 


1.19 


1.32 






0.50 


0.46 


0.50 


0.53 


2.83 


2.19 


2.57 


3.67 




hwcv 


0.50 


0.46 


0.50 


0.53 


1.15 


1.02 


1.13 


1.26 




^■oracle 


0.49 


0.46 


0.49 


0.53 


1.13 


1.01 


1.12 


1.25 



Table 7: (correlated noise). Estimation errors according to L(fl) for different noise levels 
and bandwidth choices, with d = 400 observation times. Units are selected by SRSWOR or 
stratified sampling. 

tions on the moments and much longer proofs. In another direction, it would be worthwhile 
to see whether our methodology can be extended to build confidence bands for other func- 
tional parameters such as population quantile or covariance functions. Also, as mentioned 
earlier, the weighted cross-validation proposed in this work seems a promising candidate for 
automatic bandwidth selection. However it is for now only based on heuristic arguments 
and its theoretical underpinning should be investigated. 

Finally, it is well known that taking account of auxiliary information, which can be 
made available for all the units of the population at a low cost, can lead to substantial 
improvements with model assisted estimators (Sarndal et al. 1992). In a functional context, 
an interesting strategy consists in first reducing the dimension through a functional principal 
components analysis shaped for the sampling framework (Cardot et. al 2010a) and then 
considering semi parametric models relating the principal components scores to the auxiliary 
variables (Cardot et al. 2010b). It is still possible to get consistent estimators of the 
covariance function of the limit process but further investigations are needed to prove the 
functional asymptotic normality and deduce that Gaussian simulation-based approaches 
still lead to accurate confidence bands. 

Acknowledgement. Etienne Josserand thanks the Conseil Regional de Bourgogne, France 
for its financial support (FABER PhD grant). 

Appendix 

Throughout the proofs we use the letter C to denote a generic constant whose value may 
vary from place to place. This constant does not depend on N nor on the arguments 
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SRSWOR 


Stratified sampling 


s 


h 


1 — 5 


Mean 


1Q 


Median 


3Q 


1 — a 


Mean 


1Q 


Median 


3Q 


5% 


lin 


97.2 


10.91 


10.74 


10.90 


11.07 


98.1 


5.95 


5.87 


5.95 


6.02 




h cv 


97.3 


10.89 


10.73 


10.89 


11.06 


47.5 


5.68 


5.60 


5.68 


5.76 




h wcv 


97.3 


10.89 


10.73 


10.89 


11.06 


97.5 


5.92 


5.84 


5.91 


6.00 




^oracle 


97.2 


10.90 


10.72 


10.90 


11.07 


98.0 


5.94 


5.86 


5.94 


6.02 




Ms 


97.3 


10.54 


10.36 


10.54 


10.70 


98.2 


5.59 


5.51 


5.60 


5.67 


25% 


lin 


97.7 


13.23 


13.06 


13.22 


13.41 


98.3 


8.27 


8.19 


8.27 


8.36 




h cv 


97.2 


12.66 


12.49 


12.65 


12.83 


64.7 


6.70 


6.60 


6.69 


6.79 




h wcv 


97.2 


12.66 


12.49 


12.65 


12.83 


97.3 


7.56 


7.48 


7.56 


7.65 




^oracle 


97.3 


12.70 


12.50 


12.70 


12.87 


97.5 


7.68 


7.58 


7.68 


7.79 






97.0 


10.53 


10.37 


10.52 


10.70 


97.7 


5.59 


5.51 


5.59 


5.66 



Table 8: (heteroscedastic noise). Empirical coverage levels 1 — 2 and confidence band areas 
for different noise levels and bandwidth choices, with d = 200 observation times. Units are 
selected by SRSWOR or stratified sampling. 



s,t £ [0, T]. Note also that the expectation E is jointly with respect to the design and the 
multivariate normal model. 

Proof of Theorem [TJ We first decompose the difference between the estimator /2j\r(i) and 
its target /Ujv(i) as the sum of two stochastic components, one pertaining to the sampling 
variability and the other to the measurement errors, and of a deterministic bias component: 

keu ^ k ' keu k k 

where X^(t) and ik{t) are defined in Q. 



Bias term. 

To study the bias term iV" 1 ^ fe (X k (t) - X k (t)) = E(ft N (t)) - (jt N (t) in ([22]), it suffices 
to use classical results on local linear smoothing (e.g. Tsybakov (2009), Proposition 1.13) 
together with the Holder continuity (A2) of the X k to see that 



sup 

*e[o,T] 



^rfe(t)-I t (t)) ^^E SU P X k [t)-X k {t) <Ch? 



(23) 



k k *e[°- T ] 

Hence, for the bias to be negligible in the normalized estimator, it is necessary that the 
bandwidth satisfy \/~NhP — > as N — > oo. 

Error term. 



We now turn to the measurement error term in (22), which can be seen as a sequence of 
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SRSWOR 


Stratified sampling 


s 


h 


1 — a 


Mean 


1Q 


Median 


3Q 


1 — a 


Mean 


1Q 


Median 


3Q 


5% 


lin 


97.7 


10.79 


10.63 


10.79 


10.95 


97.9 


6.03 


5.95 


6.02 


6.11 




h cv 


97.6 


10.76 


10.59 


10.77 


10.92 


48.4 


5.64 


5.57 


5.63 


5.72 




h wcv 


97.6 


10.76 


10.59 


10.77 


10.92 


97.6 


5.89 


5.82 


5.89 


5.97 




^oracle 


97.6 


10.76 


10.59 


10.77 


10.92 


97.6 


5.96 


5.88 


5.96 


6.04 




V-s 


97.7 


10.50 


10.33 


10.50 


10.65 


97.8 


5.60 


5.52 


5.59 


5.68 


25% 


lin 


97.6 


12.69 


12.52 


12.70 


12.86 


98.3 


8.59 


8.49 


8.59 


8.68 




h cv 


97.5 


12.47 


12.31 


12.48 


12.64 


58.1 


6.34 


6.24 


6.34 


6.44 




^wcv 


97.5 


12.47 


12.31 


12.48 


12.64 


97.6 


7.09 


7.00 


7.08 


7.17 




^oracle 


97.6 


12.47 


12.31 


12.48 


12.64 


97.8 


7.10 


7.01 


7.10 


7.19 




V-s 


97.9 


10.50 


10.33 


10.50 


10.66 


97.6 


5.59 


5.51 


5.59 


5.67 



Table 9: (heteroscedastic noise). Empirical coverage levels 1 — 
for different noise levels and bandwidth choices, with d = 400 
selected by SRSWOR or stratified sampling. 



a and confidence band areas 
observation times. Units are 



random functions. We first show that this sequence goes pointwise to zero in mean square 
(a fortiori in probability) at a rate (Ndh)^ 1 . We then establish its tightness in C([0, T]), 
when premultiplied by \/iV, to prove the uniformity of the convergence over [0, T]. 
Writing the vector of local linear weights at point t as 

w(t) = (w l (t),...,w d (t))' 

and using the i.i.d assumption (A4) on the (e^i, . . . , e^)', G f/jy, we first obtain that 



E 



Then, considering the facts that min/c 7Tfc > c by (Al), ||Vjv|| is uniformly bounded in A 
by (A4), and exploiting a classical bound on the weights of the local linear smoother (e.g. 



Tsybakov (2009), Lemma 1.3), we deduce that 

2 



E 



N 



< 



(min 7Tk)N 2 
C 



IW)H 2 ||Viv|| 



Ndh 



(24) 



We can now prove the tightness of the sequence of processes (N 1//2 ^ fc (Ik/^k) £&)• Let 
us define the associated pseudo-metric 

d*(s,t)=E(^/£^(e k (s)-e k (t))) . 

\ V A f-f T vr fc / 



k&U 
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SRSWOR 


Stratified sampling 


s 


h 


1 — 5 


Mean 


1Q 


Median 


3Q 


1 — 2 


Mean 


1Q 


Median 


3Q 


5% 


lin 


97.4 


21.33 


21.02 


21.32 


21.68 


96.9 


5.83 


5.75 


5.83 


5.90 




h cv 


97.4 


21.29 


20.94 


21.30 


21.60 


58.1 


5.69 


5.61 


5.69 


5.77 




h wcv 


97.4 


21.29 


20.94 


21.30 


21.60 


96.8 


5.79 


5.71 


5.79 


5.87 




^oracle 


97.4 


21.29 


20.94 


21.29 


21.61 


96.6 


5.79 


5.71 


5.79 


5.87 




Ms 


97.4 


20.77 


20.42 


20.76 


21.10 


97.6 


5.52 


5.44 


5.52 


5.60 


25% 


lin 


98.0 


13.51 


13.33 


13.52 


13.68 


95.7 


7.79 


7.71 


7.78 


7.86 




h cv 


97.5 


12.06 


11.88 


12.05 


12.23 


72.6 


7.16 


7.05 


7.14 


7.24 




h wcv 


97.5 


12.06 


11.88 


12.05 


12.23 


95.0 


7.53 


7.46 


7.53 


7.60 




^oracle 


97.6 


12.06 


11.88 


12.05 


12.22 


95.6 


7.58 


7.50 


7.57 


7.66 






97.2 


10.49 


10.31 


10.48 


10.66 


97.4 


5.52 


5.44 


5.51 


5.60 



Table 10: (correlated noise). Empirical coverage levels 1 — a and confidence band areas 
for different noise levels and bandwidth choices, with d = 400 observation times. Units are 
selected by SRSWOR or stratified sampling. 



We use the following maximal inequality holding for sub-Gaussian processes (van der Vaart 
and Wellner (2000), Corollary 2.2.8): 



E 



sup 

te[o,T] 



'AT ±-~< 



h 



feet/ 



< E 



1 



N — TTt. 

v feet/ K 



+ K 



y/log N(x,d e )dx, (25) 



where to is an arbitrary point in [0, T] and the covering number N(x,d e ) is the minimal 
number of cf e -balls of radius x > needed to cover [0,T]. Note the equivalence of working 
with packing or covering numbers in maximal inequalities, see ibid p. 98. Also note that 
the sub-Gaussian nature of the smoothed error process iV -1 / 2 Y2keu i^k/^k) efc stems from 
the i.i.d. multivariate normality of the random vectors (e^i, . . . , e^)' and the boundedness 
of the Ik for k £ Un- 



By the arguments used in ( 24 ) and an elementary bound on the increments of the weight 
function vector W (see e.g. Lemma 1 in Degras (2011)), one obtains that 



N *■ — ' 7Tt 

keu K 



< 



mm 7Tfc 



-\\W(s)-W(t)\\ 2 \\V N \ 



< 



c_ 

dh 



h 2 



A 1 



(26) 



It follows that the covering numbers satisfy 

N{x,d e ) = 1, 



N(x,d e ) < 



ic 



hv dhx 



if < x 2 
11 dh — x ' 

if £ > x2 - 
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Plugging this bound and the pointwise convergence (24) in the maximal inequality (25), 



we get after a simple integral calculation (see Eq. (17) in Degras (2011) for details) that 



E 



sup 

te[o,T] 



h 



v keu K 



hit) 



^Th +C 



\o Z (h)\ 



dh 



(27) 



Thanks to Markov's inequality, the previous bound guarantees the uniform convergence in 
probability of N~ l l 2 Ylkeui-^k/^k^k ^° zero > provided that | log(h)\ / (dh) — > as N — > oo. 
The last condition is equivalent to \og(d)/(dh) — > by the fact that dh — > oo and by the 
properties of the logarithm. 



Main term: sampling variability. 

Finally, we look at the process iV _1 X^fcet/ iJkj^k — 1) Ajc in (22), which is asymptotically 
normal in C([0,T]) as we shall see. We first establish the finite-dimensional asymptotic 
normality of this process normalized by y/N, after which we will prove its tightness thanks 
to a maximal inequality. 

Let us start by verifying that the limit covariance function of the process is indeed the 



function 7 defined in Section 3.1. The finite-sample covariance function is expressed 



E 



We 

V k&U 



h 



1 X k {s) 



i<=u 



-y 



N * — ' VTI.VT/ 
k,l£U 

T^k^l 



N 



k,ieu 

7 (s,t)+o(l) + 0(hP). 



(28) 



To derive the previous relation we have used the facts that 



max sup 

k,leU s ,te[o,T] 



X^Xtit) - X k {s)Xi(t) 



< Ch 13 



by (23) and the uniform boundedness of the X k arising from (A2) and that, by (Al), 

J_ \ ~« I Afc;| 1^ \ - [Afc^j 1^ \ - A fcfc 

N 7T fc 7T/ ~ N TTuTT, N ^— ' 7T? 

k,leU K 1 k£l k k 

1 N{N- l)max fc> i(n|A w |) , 1 ^ 1 - ir k 



< 



N 



n 



+ aE 



< c. 



TTfc 



(29) 



We now check the finite-dimensional convergence of N 1 ^ 2 Ylkeu i^k/^k — 1) X k to a 
centered Gaussian process with covariance 7. In light of the Cramer- Wold theorem, this 
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convergence is easily shown with characteristic functions and appears as a straightforward 
consequence of (A5). It suffices for us to check that the uniform boundedness of the tra- 
jectories Xk derived from (A2) is preserved by local linear smoothing, so that the are 
uniformly bounded as well. 

It remains to establish the tightness of the previous sequence of processes so as to obtain 
its asymptotic normality in C([0, T]). To that intent we use the maximal inequality of the 
Corollary 2.2.5 in van der Vaart and Wellner (2000). With the notations of this result, we 
consider the pseudo-metric d 2 ^{s,t) = EjiV -1 / 2 J2keu(^/^k - l)(Xk(s) - X k (t))} 2 and the 
function ij)(t) = t 2 for the Orlicz norm. We get the following bound for the second moment 
of the maximal increment: 



E<^ sup 

I djj-(s,t)<5 



N ^ 

k&J 



(I" 1 ) (*«-*»«) 



2 



(30) 



<C( I ^- 1 (Af(x,^))dx + ^ 1 (iV 2 (r?,dx)) 



for any arbitrary constants r), 5 > 0. Observe that the maximal inequality (30) is weaker 



than ( 25 ) where an additional assumption of sub-Gaussianity is made (no log factor in the 



integral above). Employing again the arguments of (28), we see that 



N kl nm 

< £.N(N-l) C , 

~ N 2n 1 1 N 1 

<C\s-t\ 2p . (31) 

It follows that the covering number satisfies N(x,d-^) < Cx~ 1 ^ and that the integral in 
(30} is smaller than C J^x~°- 5 ^dx = Ct] 1 " - 5 ^ , which can be made arbitrarily small since 



(3 > 0.5. Once r] is fixed, 5 can be adjusted to make the other term in the right-hand side 
of ( |30| ) arbitrarily small as well. With Markov's inequality, we deduce that the sequence 
(N~ 1 / 2 Ylk&u (Ik/ftk — 1) Xk)N>i is asymptotically cT^-equicontinuous in probability (with 
the terminology of van der Vaart and Wellner (2000)), which guarantees its tightness in 
C([0,T]). □ 

Proof of Theorem [H 

To establish the uniform convergence of the covariance estimator, we first show its mean 
square convergence in the pointwise sense. Then, we extend the pointwise convergence to 
uniform convergence through an asymptotic tightness argument (that is, by showing that 
for N large enough, the covariance estimator lies in a compact K of C([0, T] 2 ) equipped 
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with the sup- norm with probability close to 1). We make use of maximal inequalities to 
prove the asymptotic tightness result. 

Mean square convergence. 

We first decompose the distance between 7ry(s, t) and its target ^/j^(s,t) as follows: 



+ i ^ ^M(i fe(s)£ - (t)+ i i(t)£ - t(s) 



fciief/ VrfeVTi 7T H 

1 ^ A fc i 

N k%J ™ nkl 

keu k 

:= A 1>N + A 2 , N + A^ N - A 4 , N - (32) 

To establish the mean square convergence of (7Ar(s,t) — 7/v(,s,i)) to zero as N — > oo, it 
is enough to show that K(A? N ) — > for i = 1, . . . , 4, by the Cauchy-Schwarz inequality. 
Let us start with 

E ( A U) = 4 E E E{m ~ 7Tkl)(Ik ' 11 ' = ^ ^(^(o^ wet). (33) 

' iV z z — /z — ' 7T 1,71771" I,/ 7T// T^kl^k'l 1 

kl k' V 

It can be shown that this sum converges to zero by strictly following the proof of the 
Theorem 3 in Breidt and Opsomer (2000). The idea of the proof is to partition the set 



of indexes in (33) into (i) k = I and k! = I', (ii) k = I and k' ^ I' or vice- versa, (iii) 
k I and k' ^ /', and study the related subsums. The convergence to zero is then handled 
with assumption (Al) (mostly) in case (i), with (A1)-(A6) in case (iii), and thanks to the 
previous results and Cauchy-Schwarz inequality in case (ii). More precisely, it holds that 

¥(A 2 s < Cmax k ^n\A kl \ C 
1 ^ " (min7r fc ) 4 n ^ (min7r fe ) 3 iV 

/ C(max/t-^ n\ Am\)N \ 2 , , 

+ -j—. T^-j—. r- max \E{{I k Ii - -K k i)(Ik'h' ~ Trjfe/j')} • 

V(mm7r fc ) 2 (mm A: ^7rH)ny (k,l,k' ,l>)eD i>N 1 1 
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For the (slightly simpler) study of E(^4| N ), we provide an explicit decomposition: 
n^l N ) = -^2 E E ^^Xk&Xk'it) Wkh'h) E(e,(*)eK*)) 



4 E ^(^)^(i)E(e fe ( S )e,(i)) 

8 A A ( 35 ) 



+ 4E E ^fc(s)^fc'(*)E(/fc/fc'/i)E(€ l (8)€,(*)). 

iV z ^k^k'^f^kl^k'! 
k,k' l<£{k,k>} K ft z fti ft 1 

Note that the expression of E(A?, ^y) as a quadruple sum over k,l,k',l' 6 Z7tv reduces to 
a triple sum since E(q(s)e^(i)) = if i ^ Z' by (A4). Also note that |E(J fc / fe /J z )| < 1 
for all k,k',l G Z7. With (Al), (A2), and the bound \E(e k (s)e k (t))\ = \W{s)'V N W(t)\ < 
\\W(s)\\\\V N \\\\W(t)\\ < C/(dh), it follows that 

tw/i2 \ ^ C N ||~V~7v || CiV 2 max fc ^ fc /n|A fefc /| ||Vjv|| 
JM^2JvJ S -T^r— 77 1 



A^ 2 dh N 2 n dh 

CN 3 (max fc ^n|A M |) 2 ||Vjy|| C 

A^ 2 n 2 dh Ndh [ ' 



We turn to the evaluation of 

iuv/i2 \ 1 ^ki^yv E(4//4^/) / \~ 

E(^ 3 ,Jv) = T72 >^ E(e*(s)€i(t)e fc /(a)ej#(t)). 

ktfc'V 7rk7Tl7rk ' 7Tl ' ^kl^k'V 

We use the independence (A4) of the errors across population units to partition the above 
quadruple sum E(^4| N ) according to the cases (i) k = I, k 1 = I' ', k ^ k' , (ii) = k! = Z, 
and 7^ fc', (hi) k = k' , I = I', and k ^ I, and (iv) k = I = k' = I' . Therefore, 

E (4iv) = ^ E -?T- + ^f*) E( e ~,( S ) e ~^))E(M*)M*)) 

+ 4 E - W-E^W) E(e 2 (i)) + ^E^f E ( e K s ) e K*))- (37) 



Forgoing the calculations already done before, we focus on the main task which for 
this term is to bound the quantity E(e|(s)e|(t)) (recall that E(efc(s)e&(i)) < C/(dh) as 
seen before). We hrst note that E(^(a)e|(t)) < {E(e^(s))} 1/2 {E(e^))} 1/2 . Writing e ~ 
N{0,V N ), it holds that E(e£(t)) = E((VF(t)'e) 4 ) = 3(W(t)'VtfW(i)) 2 by the moment 
properties of the normal distribution. Plugging this expression in (37), we find that 

E ^.»> £ + W (38) 

Finally, like E(efc(s)efc(i)), the deterministic term is of order l/(dh). 
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Tightness. 

To prove the tightness of the sequence (7^ — 7at)tv>i in C([0,T] 2 ), we study separately 



each term in the decomposition (32) and we call again to the maximal inequalities of van 
der Vaart and Wellner (2000). 

For the first term = A\^(s, t), we consider the pseudo-metric d defined as the L 4 - 
norm of the increments: df((s, t), (s 1 , t')) = K\A\ j iy(s,t) — Ai^(s' ,t')\ 4 . (The need to use 
here the L 4 -norm and not the usual L 2 -norm is justified hereafter by a dimension argument.) 
With (Al)-(A2) and the approximation properties of local linear smoothers, one sees that 



(hh 



1 



<C(\s-s'\P + \t-t'\P 



Hence dt(s,t) <C(\s- s'f + \t- t'f) and for all x > 0, the covering number N(x, d±) is no 
larger than the size of a two-dimensional square grid of mesh x 1 ^, i.e. N(x,di) <Cx' 2 ^. 
(Compare to the proof of Theorem 1 where, for the main term A" -1 / 2 ^2 k (Ik/^k)Xk, we 
have N(x,dj() < Cx _1// ^ because the index set [0, T] is of dimension 1.) Using Theorem 
2.2.4 of van der Vaart and Wellner (2000) with ip(t) = t 4 , it follows that for all 77, 6 > 0, 

E J sup \A 1>N (s, t) - A hN (s', t')A<c( P tp~ 1 (N(x, d{))dx + ^ _1 (A 2 (r/, d x )) 

ydi{{s,t),(s',t'))<5 J \Jo 

,1-0.5//? , x„-l//9> 4 



< c ^-v*/p + Sif 

The upper bound above can be made arbitrarily small by varying r] first and 5 next since 
/3 > 0.5. Hence, with Markov's inequality, we deduce that the processes A-y^ are tight in 
C([0,T] 2 ). 

The bivariate processes {A2 n) n>i are sub-Gaussian for the same reasons as the univari- 
ate processes N~ 1 / 2 Ylkeu i^k/^k) ^k are in the proof of Theorem 1, namely the indepen- 
dence and multivariate normality of the error vectors (eki, . . . , ekd)' and the boundedness of 
the sample membership indicators I k for k S Un- Therefore, although the covering num- 
ber N(x, dq) grows to 0(x~ 2 ^) in dimension 2, with c?2 being the L 2 -norm on [0,T] 2 , this 
does not affect significantly the integral upper bound J °° y / log(N(x, d2)dx in a maximal 
inequality like (25). As a consequence, one obtains the tightness of (A^n) in C([0,T] 2 ). 

To study the term AsN(s,t) in (32), we start with the following bound: 

N Tl T^kl 2 

k,l 




k 
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The two-dimensional study is thus reduced to an easier one-dimensional problem. 

To apply the Corollary 2.2.5 of van der Vaart and Wellner (2000), we consider the 
function ip(t) = t m and the pseudo-metric d™(s,t) = ^N^ 1 ^ k (e 2 k (s) - e|(t))| m , where 
m > 1 is an arbitrary integer. We have that 



E <^ sup 



j^E®") \^ C (l V(*,*)) 1/m «k) (39) 



where Dt = sup s } te[0,T] ^{s, t) is the diameter of [0, T] for d^. Using the classical inequality, 
Efc=i a k\ m — n m ~ x Ylk=i \ a k\ m , for m > 1 and arbitrary real numbers a±, . . . , a n , we get, 
with the Cauchy-Schwarz inequality and the moment properties of Gaussian random vectors, 
that 



^(M)<^£ E i3to-3(*)r 

k 



< - > {e \e k (s) - e k (t)\ 2m ] {E |e fc ( S ) + e k (t)\ 2m ] 



N ^ 

k 

a 



< : - Y^WW(s) -W(t)\\^ N \\W(s) + W(t)\\ 



k 



< 7 ^f^-^Al) . (10) 



where ||x||v N = (x'Vatx) 1 / 2 and C m and are constants that only depend on 



m. 



We deduce from (40) that the diameter Dt is at most of order l/{dh) and that for all 
< x < l/(dh), the covering number N(x,ds) is of order l/(xdh 2 ). Hence the integral 
bound in ^ is of order f^ /{dh] \dh 2 x)~ 1 / m dx < C^/i 2 )- 1 /" 1 ^)^ 1 /™)) = C/(dh) l+1 / m . 



Therefore, if dh 1+a — > oo for some a > 0, the sequence (N 1 ^2 k (^ k ))N>i tends uniformly 
to zero in probability which concludes the study of the term (^aOtv^i and the proof. □ 

Proof of Theorem |$l 

We show here the weak convergence of (Gat) to G in C([0, T]) conditionally on 7jy. This 
convergence, together with the uniform convergence of to 7 presented in Theorem [2j is 
stronger than the result of Theorem [3] required to build simultaneous confidence bands. 

First, the finite-dimensional convergence of (Gat) to G conditionally on 77V is a trivial 
consequence of Theorem [2} 

Second, we show the tightness of (Gn) in C([0,T]) (conditionally on 7^) similarly to 
the study of (^3,iv) in the proof of Theorem [2] We start by considering the random pseudo- 
metric d™(s,t) = E[(G/v(s) — GN(t)) m \jn], where m > 1 is an arbitrary integer. By the 



30 



moment properties of Gaussian random variables and by (Al), it holds that 



d?{s,t) = C„ 
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k,ieu 
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k,ieu 
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k,ieu 







|A fci 
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T^kl 






1 4/i 







(X k (s) - Xkit))^) - Xt(t)) 

-| m/2 

[X fc ( S )-X fe (t)f 



m/2 



|Ajfcj| 7jfc/j 



N k% nkl ™ 



(e fc (s) - e k (t))" 



m/2 



±J2(Xk(s)-X k (t)f 



m/2 



±Y,(tk(s)-h(t)f 



m/2 



(41) 



Note that the the value of the constant C m varies across the previous bounds. Clearly, the 



first sum in the right-hand side of (41) is dominated by \s — t\ m P thanks to (A2) and the 
approximation properties of local linear smoothers. The second sum can be viewed as a 



1/2 ! 

random quadratic form. Denoting a square root of V/v by Vjy , we can write e k as zj k 
for k = 1,...,N (the equality holds in distribution), where the Z k are i.i.d. centered 
(i-dimensional Gaussian vectors with identity covariance matrix. Thus, 



rl/2, 



±Y,(tk(s)-~e k (t)) 2 = (W( S )-W(t))' | 
k 

<\\W(s)-W(t)\\ 2 
<\\W(s)-W(t)\\ 2 



N S 

k 

k 



e k e' k (W(s) - W{t)) 



IV 



N I 



(42) 



Now, the vector norm \\W(s) - W(t)\\ z has already been studied in p61 and the sequence 
(||Vjv||) is bounded by (A4). The remaining matrix norm in (42) is smaller than the largest 
eigenvalue, up to a factor iV -1 , of a d-variate Wishart matrix with iV degrees of freedom. 
By (A3) it holds that d = o{N/ log log N) and one can apply Theorem 3.1 in Fey et al. 
(2008), which states that for any fixed a > 1, 



lim 

N^oo 



llogP^ l^Z fc Z' fc >aj =i(a-l-loga). 



(43) 



An immediate consequence of @ is that H^EfcZfcZfcll 

remains almost surely bounded as 
N —7- oo. Note that the same result holds if instead of (A3), (d/N) remains bounded away 
from zero and infinity, thanks to the pioneer work of Geman (1980) on the norm of random 
matrices. Thus, there exists a deterministic constant C £ (0, oo) such that 



d?(s,t) < C\s-t\ 



+ 



C 



(dh) m / 2 V h 



A 1 



(44) 



31 



for all s,t e [0, T], with probability tending to 1 as TV — > oo. Similarly to the previous 
entropy calculations, one can show that there exists a constant C G (0, oo) such that 
N(x,dy) < C(ar 1//3 + (d/i 3 )" 1 / 2 ^ 1 ) for all x < (dh)- 1 with probability tending to 1 as 
N — > oo. Applying the maximal inequality of van der Vaart and Wellner (2000) (Th. 2.2.4) 
to the conditional increments of Gn, with (f)(t) = t m (usual L m -norm), one finds a covering 
integral (N(x, d^f^dx of the order of (dh) 1 /^' 1 + (dh z )~ l ^ 2m \dh) l l m - 1 . Hence 

l + l/(2m) 

the covering integral tends to zero in probability, provided that h — > and d/t 1 " 1 /^™.) — > oo 
as TV — > oo. Obvisouly, the latter condition on h holds for some integer m > 1 if dh 1+a — > oo 
for some real a > 0. Under this condition, the sequence (Gn) is tight in C([0, T]) and 
therefore converges to G. □ 
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